272 ◾ Bioinformatics
BioProject by the above accession number or simply copy and paste the following URL on
the Internet browser:
https://www.ncbi.nlm.nih.gov/sra/?term=PRJEB24421
Then, use “Send to” dropdown menu to download the runinfo text file. After download-
ing the text file, open the file in Excel, delete all columns except the column with the run
accessions and remove the column name as well, and save the file as “runids.txt” in the
“data” subdirectory.
Instead of the above, you can also use the following EDirect script, which extracts the
run accessions and stores them in a file named “runids.txt” in “data” subdirectory (you
should have the NCBI Entrez Direct installed):
esearch -db sra -query ‘PRJEB24421[bioproject]’ \
| efetch -format runinfo \
| cut -f1 -d, > data/runids.txt
sed -i ‘/^$/d’ data/runids.txt
sed -i ‘/^Run/d’ data/runids.txt
Check to see if the file has been saved successfully by using “ls data/” command or you can
display the file content by using “vim data/runids.txt” command.
After saving the text file with the 86 run accessions in the “data/runids.txt” file, you
can then download the raw FASTQ files from the NCBI SRA database either by saving
the following script in a bash file “download.sh” and then run it as “bash download.sh” or
you can just enter the script on the terminal command-line prompt, while you are in the
project directory:
while read f;
do
fasterq-dump --progress --outdir data “$f”
done < data/runids.txt
You will see the downloading progress. The files require only 771.29MB of storage space.
The 172 FASTQ files will be downloaded in the “data” subdirectory, two files for each
sample. When the files have been downloaded successfully, you can check the content
of the “data” subdirectory and count the number of the FASTQ files using the following
command:
ls data/*.fastq | wc -l
The number of files should be 172. If it is not, you may need to run the download script
again.
7.3.3.2 Creating the Sample Metadata File
Open the NCBI SRA using the above URL. Then, open Run Selector from “Send to” drop-
down menu. All runs will be displayed on the Run Selector. Click “Metadata” button